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Abstract 

We propose a finite sample based predictor for estimated linear one dimensional time series 
models and compute the associated total forecasting error. The expression for the error that we 
present takes into account the estimation error. Unlike existing solutions in the literature, our 
formulas require neither assumptions on the second order stationarity of the sample nor Monte 
Carlo simulations for their evaluation. This result is used to prove the pertinence of a new hybrid 
scheme that we put forward for the forecast of linear temporal aggregates. This novel strategy 
consists of carrying out the parameter estimation based on disaggregated data and the prediction 
based on the corresponding aggregated model and data. We show that in some instances this 
scheme has a better performance than the "all-disaggregated" approach presented as optimal in the 
literature. 
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1 Introduction 

The success of parametric time series models as a tool of choice in many research fields is due in part 
to their good performance when it comes to empirical forecasting based on historical samples. Once a 
data generating process (DGP) has been selected and estimated for the forecasting problem at hand, 
there is a variety of well studied forecasting procedures and algorithms available in the literature. The 
most widespread loss function used in the construction of predic tors is the mean square forecasting 
error (MSFE); see the monographs |Bro061 IBD021 IHam94l IL05 and references therein for detailed 



presentations of the available MSFE minimization-based techniques. This is the approach to prediction 
that we follow in this work; the reader is referred to (Gra69 or Section 4.2 in [GN86J for forecasting 
techniques based on other optimality criteria. 

The stochastic nature of the time series models that we consider implies that the forecasts produced 
with them, carry in their wake an error that cannot be minimized even if the parameters of the model 
are known with total precision; we refer to this as the characteristic error of the model. Additionally, 
all that is known in most applications is a historical sample of the variable that needs to be forecasted, 
out of which a model needs to be selected and estimated. There are well-known techniques to implement 
this, which are also stochastic in nature and that increase the total error committed when computing a 
forecast; we talk in that case of model selection error and estimation error. All these errors that 
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one incurs in at the time of carrying out a forecasting task are of different nature and much effort has 
been dedicated in the hterature in order to quantify them in the case of hnear multivariate VARMA 
processes. 

Most results obtained in this direction have to do with the combination of the estimation and 
the characteristic errors; this compound error is always studied assuming independence between the 
realizations of the model that are used for estimatio n and the ones used for prediction; we refer the 
reader to |Bai79l IReiSOl lYamSOl lYamSll iDuMl IBS86[ IL87, SH88]. Exphcit expressions for these errors 
in the VARMA context are available in the monograph ^L05, . Indeed, if we assume that the sample 
out of which we want to for ecast is a realization of the unique stationary solution of a VAR model, this 
error can be written down [LOSI page 97] using the time-independent autocovariance of th e process; 



the situation in the VARMA context is more complicated and the expression provided [LOSi page 490] 
requires Monte Carlo simulations for its estimation. 

The knowledge regarding the error associated to model selection is much more rudimentary and 
research in this subject seems to be in a mor e prim itive state. A good description of the state of the art 
can be found in Xoil page 318] as well as in (L86b[ page 89] , and references therein. We do not consider 
this source of forecasting error in our work and hence in the sequel we will use the denomination total 
error to refer to the combination of the characteristic with the estimation errors. 

In this paper we concentrate on one dimensional linear processes, a subclass of which is the ARMA 
family. The first contribution in this paper is the formulation of a MSFE based predictor that takes 
as ingredients a finite sample and the coefficients of a linear model estimated on it, as well as the 
computation of the corresponding total error. The main improvements that we provide with respect to 
preexisting work on this question are: 

• We make no hypothesis on the second order stationarity of the sample at hand; in other words, 
we do not assume that the sample is a realization of the stationary solution of the recursions 
that define the model. Such a hypothesis is extremely difficult to test in small and finite sample 
contexts and it is hence of much interest to be able to avoid it. 

• The expression for the total forecasting error is completely explicit and does not require the use 
of Monte Carlo simulations. 

The interplay between the characteristic error, the estimation errors, and the forecasting horizon is highly 
nonlinear and can produce surprising phenomena. For example, as it is well known, the characteristic 
error is an increasing function of the horizon, that is, the further into the future we forecast, the more 
error we are likely to commit. When we take into account the estimation error, the total error may 
decrease with the forecast horizon! We study this finite sample related phenomenon with the total error 
formula introduced in Theorem |3.3| and illustrate it with an example in Section |5.1| 

The characterization of the total forecasting error that we described serves as a basis for the 
second main theme of this paper, namely, the interplay between multistep forecasting, the predic- 
tion of temporal aggregates, and the use of temporal aggregation estimation based techniques to 
lower the total forecasting error. In this part of the paper we work strictly in the ARMA con- 
text. The temporal aggregation of ARMA processes is a venerable topic that is by now well under- 
stood |AW72l ITia72l IBre73l lTW76l IWei79l [5W 86 , Wci06, SV08, a nd has been e xtensively s tudied and 
exploited in the context of forecasting |Abr82[ IL86bl lL86al iLStI lL89a[ lL89b[ IRS95[ IlOgI IL09| IL10| 
mainly by H. Liitkepohl. 

A recurrent question in this setup consists of determining the most efficient way to compute multistep 
forecasts or, more generally, predictions of linear temporal aggregates of a given time series. More 
specifically, given a sample and an underlying model, we can imagine at least two ways to construct a h 
time steps ahead forecast, or in general the one that is a linear combination of the h steps ahead values 
for the time series. First, we can simply compute the h time steps ahead forecasts of the time series 
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out of the original disaggregated sample and to determine the needed aggregate prediction out of them; 
another possibility would be to temporally aggregate the sample and the time series model in such a way 
that the required forecast becomes a one time step ahead forecast for the new aggregated sample and 
model. If we do not take into consideration estimation errors and we only consider the characteristic 
error, there is a general result that states that the forecast based on high frequency disaggregate data 
has an associated error that is smaller or e qual than the one associated to the aggre gate sampl e and 
model (we wiU recaU it in Proposition |4!2| . In the VARMA context, H. Liitkepohl [L86b[ lL87l ILOQ) 



has characterized the situations in which there is no loss of forecasting efficiency when working with 
temporally aggregated ingredients. 

W hen estimat ion errors are taken into account, the inequality that we just described becomes 
strict |L86b[lL87] . that is, forecasts based on models estimated using the dissagregated high-frequency 
samples perform always better than t hose based on m odels estimated using aggregated data. This is 



so even in the situations described in L86b L87l L09 for which the characteristic errors associated to 



the use of the aggregated and the disaggregated models are identical; this is intuitively very reasonable 
due to the smaller sample size associated to the aggregated situation, which automatically causes an 
increase in the estimation error. 



In Section 4.3 we propose a forecasting scheme that is a hybrid between the two strategies that 
we just described. We first use the high frequency data for estimating a model. Then, we temporally 
aggregate the data and the model and finally forecasting is carried out based on these two aggregated 
ingredients. We will show that this scheme presents two major advantages: 

• The model parameters are estimated using all the information available with the bigger sample 
size provided by the disaggregated data. Moreover, these parameters can be updated as new high 
frequency data becomes available. 

• In some situations, the total error committed using this hybrid forecasting scheme is smaller than 
the one associated to the forecast based on the disaggregated data and model and hence our 
strategy becomes optimal. Examples in this direction for both stock and flow temporal aggregates 
are presented in Section [5j The increase in performance obtained with our method comes from 
minimizing the estimation error; given that the contribution of this error to the total one for 
univariate time series models is usually small for sizeable samples, the differences in forecasting 
performance that we will observe in practice are moderate. As we will show in a forthcoming 
work, this is likely to be different in the multivariate setup where in many cases, the estimation 
error is the main source of error. 

To our knowledge, this forecasting scheme has not been previously investigated in the literature 
and the improvement stated in the last point seems to be the first substantial application of temporal 
aggregation techniques in the enhancing of forecasting efficiency. 



2 Finite sample forecasting of linear processes 

In this section we introduce notations and definitions used throughout the paper and describe the 
framework in which we work. Additionally, since we are interested in finite sample based forecasting, we 
spell out in detail the predictors as well as the information sets on which our constructions are based. 

2.1 Linear processes 

Let e — {£t}^_oo be a set of independent and identically distributed random variables with mean zero 
and variance a^. We will write in short 



£ = {et}-IID (0,a") 
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We say that X = {Xt}'^_^ is a linear causal process whenever it can be represented as 

oo 

Xt^Yl * ^ ^' (2.1) 



where {i/'il^Q is a set of real constants such that '^°Zq IV'il < oo. Expression (2.1 1 can also be rewritten 



as 



X = 

where L is the backward shift operator and ^ (z) is the power series ^ (z) = X^i^o^*'^'- "^^^ process 
X defined in (2.1 1 is called invertible if there exist constants {t^jJ^q such that J^jLo Kjl ^ °° ^^'^ 



Et — TTjXt^j, for all t E Z, 

3=0 



(2.2) 



or equivalently, e = 11 (L) X, where 11 (z) is the power series 11 (z) = X]^o '^j^'' ■ ^ (-^) ^^^^ {L) can 
also be referred to as causal linear filter and invertible linear filter, respectively. 

2.2 Finite sample forecasting of causal and invertible ARMA processes 

Consider the causal and invertible ARMA(p, q) specification determined by the equivalent relations 



^{L)Xt = &{L)et, Xt = }2Ast 



(2.3) 



The innovations process e — {st} can be either independent and identically distributed IID(0, cr^) 
or white noise WN(0,ct^). In this subsection we focus on how to forecast out of a finite sample — 
{xi, xt} that satisfies the relations (2.3) and that has been generated out of a presample {xi-p, a;o} 
and a preinnovations set {ei-q, ...,eo}- A standard way to solve this problem jBroOS) [BD02[ consists of 
assuming that is a realization of the unique stationary process X that satisfies the ARMA relations 



(2.3 1 and to use its corresponding time independent autocovariance functions to formulate a linear 



system of equations whose solution provides the linear projection Xt+h of the random variable Xt+h 
onto {xi, ...,xt} using the norm; this projection X^+h minimizes the mean square error. We recall 
that writing the unique stationary solution of (2.3) usually requires knowledge about the infinite past 
history of the process. For example, for an AR(1) model of the form Xt — (pXt-i — St, the unique 
stationary solution is given by Xt = X^i^o 

Given that we are concentrating in the finite sample context, we prefer for this reason to avoid the 
stationarity hypothesis and the use of the corresponding autocovariance functions and to exploit in the 
forecast only the information that is strictly available, that is: 



(i) The model specification (2.3): we assume that the model parameters are known with certainty and 
we neglect estimation errors. 



(ii) The sample = {a;i, a;T}- 

(iii) The presample ...,a;o} and preinnovations {£i-g, ■ 
generation. 

We now define the preset / as 



jEq} that have been used in the sample 



{a;i_p, xq} U {ei-g, £q} , when p,q^O 
I ■■= { {a;i_p, ...,a::o} , when q = 

{ei_g, ...,£o} , whenp = 0. 
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Let r — max {p, q} and define the enlarged preset /* as 

/* :— {xi_r, ■■■,Xq} LI {Si^r: ■••7 £o} i 

where: 

• if p > g: r ^ p and et := I]*to"^ '^■j^t-j, i-p<t<l-q; 

• if q> p: r = q and Xt = I] -lo"^ A^t^i, l-q<t<l-p- 

• if (7 = p: 1 = 1*. 

The enlarged preset /* is defined as a function of the elements in /. Consequently, the sigma-algebras 
a{I) and cr(J*) generated by / and /*, respectively, coincide, that is , a(I ) = cr(/*). 

The following result is basically known (see for example |Ham94l iLOSj ) but we include it in order to 
be explicit and self-contained about the forecasting scheme that we are using in the rest of the paper 
and also to spell out the peculiarities of the finite sample setup in which we are working. We include a 
brief proof in the appendix. 

Proposition 2.1 In the conditions that we just described: 

(i) The information sets (sigma algebras) it (et) ■= cr (/, £i, Et) md (T (^x) := a {I,xi, ...,xt) 
generated by the preset and the past histories of the innovations er ■— {^i, • • • , £t} md the sample 
■— {xi, . . . , Xt} coincide, that is, 

a{^T) = cT{eT). (2.4) 

(ii) // the innovations process is IID (respectively WN) then the optimal multistep forecast Xt+h 
(respectively optimal linear forecast) based on (J {S,t) that minimizes the mean square forecasting 

error (MSFE) E (Xt+h - X^n)" 

T+h-l+r T-l+r T-l+rT-i-l+r 

Xr+h = ^ i^rET+h-i = ^ 1pi+h£T-i = ^ ^ Tp^+hTTjXT-i-j. (2.5) 
i=h 1=0 i=0 j=0 



(iii) The MSFE associated to this forecast i 



IS 



h-l 



MSFE(Xt+J =a2^V^'- (2.6) 



i=0 



(iv) For ARMA models, the forecasts constructed in (2.51 for different horizons with respect to the 



same information set Tt '■= c (<^t) = f {^t), satisfy the following recursive formula: 

^^^^ _ I (j)iXT+h-l + ■■■ + (t'pXT+h-p + (^h^T + ■■■ + ()q£T+h-q, Q > h, y-j 

I (fiiXT+h-i + ■■■ + <t>pXT+h-p, q < h. 

Remark 2.2 Testing the stationarity of small or in general finite samples is a difficult task in practice. 
We emphasize that the prediction in Proposition |2 . 1 1 does not require any stationarity hypothesis. More- 



over, we underline that the forecast (2.51 does not coincide in general neither with the standard linear 



forecast for second order stationary series that uses the corresponding time independent autocovariance 
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function (see for example |Bro06| . page 63), nor with the usual finite sample approximation to the op- 
timal forecast (see |Ham94| . page 85). The main diff'erence with the latter consists of the fact that the 
innovations associated to the presample are not assumed to be equal to zero but they are reconstructed 
out of it so that there is no loss of information. In the examples |2.3| and |2.4| below we show how our 
forecast allows us to construct a predictor that: 

(i) is different from the one obtained assuming stationarity; 

(ii) has a better performance in terms of characteristic forecasting error. 

These statements do not generalize to arbitrary ARMA models; for example, for pure AR models, the 
predictor that we propose and those cited above coincide. ■ 

Example 2.3 Finite sample forecasting for the MA(1) process. 
We consider the MA(1) model 

Xt^et+0et-i (2.8) 

and the trivial sample consisting of just one value xi at time t = 1; this sample is generated by the 
preset / — {eq} and the innovation ei. In this case, the enlarged preset /* = {a;o,eo} with xq — Eq- 
Moreover, we have 

• ipo = 1, ipi = 6 and ipi = 0, for any integer i > I, 

• ttq = 1, TTj = {—ly 0^ , for any integer j > 1. 



Consequently by (2.5), the forecast X2 based on the information set Ti = a ({/, xi}), is given by 

X2=9ei=9 {xi - exo) = Oxi - O'^sq, 

and has the associated error 

MSFE(X2) = cr^. 

On the other hand, the forecast that assumes that xi is a realization of the unique stationary solution 



of (2.8) and that uses the corresponding autocovariance function |Bro061 page 63] is given by 



vS 

2 - 1 + 02 



and has the associated error 



MSFE(Xf ) ^a'^{l + e'^ 



2fl2 



We note that 



MSFE(X|) = 



1 



1 



> cr2 = MSFE(X2), 



which shows that the forecast that we propose has a better performance than the one based on the 
stationarity hypothesis. I 

The better performance of the forecast that we propose in the preceding example can be in part due 
to the fact that we are using for X2 additional information on the preinnovations that is not taken 
advantage of at the time of writing A"^. In the following example we consider an ARMA(1,1) model 



and we see that the statements of Remark 2.2 also hold, even though in this case, unlike in the MA(1) 



situation, the information sets on which the two forecasts considered are based are identical. 
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Example 2.4 Finite sample forecasting for the ARMA(1,1) process. 
Consider the model 

Xt - (pXt-i = e* + 9et-i. 

Then, 

• TTo = 1, TTj = (—1)"' (0 + 9) 0^^^, for any integer j > 1, 

• tpo = 1, ipi = {(/) + 9) 4>^^^, for any integer i > 1. 



We consider the trivial sample xi generated by the preset / = {xq, Sq} = I* . Using Proposition 2.1 



we 



have that the one-step ahead forecast X2 based on the information set J-i ^ a is given by 

X2 = {(j) + 9) xi - e {(j) + 9) xo, with MSFE(X2) =cr^ 

On the other hand, the forecast based on the stationarity hypothesis using the same information set, 
yields 

-5 ^ {9^ +cj)9 + l)i9 + 0) {^9 + 1) _ (g + 0) {9(^ + 1)9 

2 (02 + gj, + 1)2 _ 02 ^1 (6)2 +6(fi!)+ 1)2-6*2'^°' 



and 



MSFE(^) - (^^ + '^^ + 1) {9' + 9'^ + 9^+l) 
^^^^(^2 ) - (9^ + 96+1)^-9^ ■ 



It is easy to check that the statement MSFE(X|') > MSFE(X2) is equivalent to 9^ [9 + (ff > 0, 
which is always satisfied and shows that the forecast that we propose has a better performance than 
the one based on the stationarity hypothesis. ■ 



3 Forecasting with estimated linear processes 



In Proposition |2.1| we studied forecasting when the parameters of the model are known with total 
precision. In this section we explore a more general situation in which the parameters are estimated 
out of a sample. Our goal is to quantify the joint mean square forecasting error that comes both from 
the stochastic nature of the model (characteristic error) and the estimation error; we will refer to this 
aggregation of errors as the total error. This problem has been extensively studied in the references 
cited in the introduction always using the following two main constituents: 

• Estimation and the forecasting are carried out using independent realizations of the time series 
model. 

• The model parameter estimator is assumed to be asymptotically normal (for example, the max- 
imum likelihood estimator); this hypothesis is combined with the use of the so called Delta 
Method [SerSOj in order to come up with precise expressions for the total error. 



The most detailed formulas for the total error in the VARMA context can be found in |L05 where 



the Delta Method is applied to the forecast considered as a smooth function of the model parameters. 
If we assume that the sample out of which we want to forecast is a realization of the unique stationary 
solution of aVAR model, an explicit expression for this error can be written down by following this 
approach |L05| page 97] that involves the time- indep endent autocovariance of the process. In the 



VARMA setup, the situation is more complicated |L65[ page 490] and the resulting formula requires the 
use of a Monte Carlo estimation. 



In subsection |3 . 1 1 we start by obtaining a formula for the total error using a different approach at the 
time of invoking the Delta Method; our strategy uses this method at a more primitive level by considering 
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the parameters of the hnear representation of the process seen as a function of the ARMA coefficients. 
We show that discarding higher order terms on 1/ -s/T, where T is the sample size used for estimation, 
the resulting formula for the total error can be approximated by a completely explicit expression that 
involves only the model parameters and the covariance matrix associated to the asymptotically normal 

estimator of the ARMA coefhcients. 

we rederive the total error formula by H. Liitkepohl |L05[ page 490] and show 



In subsection 



3.2 



that it can be rewritten as explicitly as ours without using any stationarity hypothesis or Monte Carlo 

simulations^ Moreover, we show that this formula coincides with the approximated one obtained in 

by discarding higher order terms on 1/Vt. 



subsection 



3.1 



3.1 The total error of finite sample based forecasting 

Consider the causal and invertible ARMA(p, q) process {Xt} determined by the equivalent relations 

oo oo 

^ {L) Xt ^ & {L) et, Xt = ^^,£t_„ et = J2^J^t~J^ {ej ~ IID(0, a^), (3.1) 

1=0 j=Q 



and denote ^ := . . .}, 11 := {ttq, tti, . . .}. In Proposition 2.1 we studied forecasting for the 



process (3.1) when the parameters or 11 of the model are known with total precision; in this section 
we suppose that these parameters are estimated by using a sample independent from the one that 
will be used for forecasting. A more preferable assumption would have been that the parameters ^ 
are estimated based on the same sample that we intend to use for prediction, but exploiting only 
data up to the forecasting origin; Samaranayake |SH88j and Basu et al [BS86) have shown that many 
results obtained in the presence of the independence hypothesis remain valid under this more reasonable 
assumption. 

Under the independence hypothesis, the model coefficients or 11 become random variables inde- 
pendent from the process X and the innovations e. Moreover, we assume that these random variables 
are asymptotically normal, as for example in the case of maximum likelihood estimation of the ARMA 
coefficients. 

For the sake of completeness, we start by recalling the Delta Method, that will be used profusely in 
the following pages. A proof and related asymptotic statements can be found in |Ser80| . 

Lemma 3.1 (Delta Method) Let (3 be an asymptotically normal estimator for the vector parameter 
l3 e M", that is, there exists a covariance matrix S such that 

Vf(^-I3) N(0, E), 

\ / T— i-oo 

where T is the sample size. Let f : M" — > M™ be a vector valued continuously differentiable function and 
let Jf be its Jacobian matrix, that is, {{Jf){f3))ij ■— {dfi/df3j){(3). If Jf{f3) ^ 0, then 



Vf (/(3) - /(/3)) ^ N{o, jfj:j}). 



The next ingredient needed in the formulation of the main result of this section is the covariance 
matrix Ssp associated to the asymptotic normal character of the estimator Hp for the parameters 
Hp := ("01, . . . , tpp, TTi, . . . , TTp), for some integer P. This is spelled out in the following lemma whose 
proof is a straightforward combination of the Delta Method with the results in Section 8.8 of |Bro06j . 



Lemma 3.2 Let{Xt} be a causal and invertible ARMA(p,q) process like in {3. 1). Let^ :— {(pi, . . . ,(j)p)' , 
:= {Oi, . . . , 9q)' , and (3 :— ©')' be the ARMA parameter vectors and let Hp := {^pi, . . . , ipp, tti, . . . , 
be a collection of length 2P of the parameters that provide the linear causal and invertible representations 
of that model. Then: 
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(i) The maximum likelihood estimator (3 of (3 is asymptotically normal 

Vr(^/3-/3j >N{0,J:/3), with T.p^a \ E[VtV{] S[VtV^] 

where U( :— (Ut, • . . , Ut+i-p)' , Vt := (Vt, . . . , Vt+i^q)' , and {Ut} and {Vt} are the autoregressive 
processes determined by 

^{L)Ut = St and ®iL)Vt = 

(ii) Consider the elements in Hp as a function of (3, that is, ap{(3) :~ (V'i(/3), . . . , ipp{f3), 7ri(/3), . . . , 7rp(/3)). 

Then, by the Delta Method we have that 



and ( Je 



VT (sp - Hp 
a(Hp), 



N{0,^Sp), where Js^^pJ^r 



(3.2) 



9/3, 



1,...,2P, j = 



,p + q. Details on how to algorithmically compute 



the Jacobian Jsp are provided in Appendix 7.2 

The next theorem is the main result in this section. Its proof can be found in the appendix. 

Theorem 3.3 Let = {xi, ■ ■ ■ , xt} be a sample obtained as a realization of the causal and invertible 
ARMA(p,q) model in (3.1) using a preset I. In order to forecast out of this sample, we first estimate 

the parameters of the model "9 = I'i/'o, V'li ■ ■ • I? n = {ttq, tti, . . . } based on another sample that we 



assume to be independent of^r, using the maximum likelihood estimator^ := , ) of the ARM A 
parameters. If we use the forecasting scheme introduced in Proposition\2.1\ then: 



(i) The optimal multistep forecast Xt+h for Xx+h based on the information set Tt generated by the 
sample Ou^d using the coefficients estimated on the independent sample S^'rp is 



T+h-l+r 



(3.3) 



i—h 



where r = max{p, q} and it '.= X]5=o ^ ^j^t-j ■ 
(ii) The mean square forecasting error (MSFE) associated to this forecast is 



h-l 



MSFE (XT+h) ^<^^Y. + 



P P P-i P-i-j 

_i—h i—h j—0 k—Q 

P P-iP-i-j P P-i'P-i'-j' 

i=h j=0 fe=0 i'=hi'=0 k'=0 



i+j+k,i'+j'+k' 



(3.4) 



where P = T + h — 1 



The first summand will be referred to as the characteristic forecasting error and the second 
one as the estimation based forecasting error. Notice that the characteristic error coincides 
with (2.6) and amounts to the forecasting error committed when the model parameters are known 
with the total precision. 
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(iii) Lei Hp ^= ("01, . . . , ^/jp, TTi, . . . , TTp) with P = T + h — 1 + r. Using the notation introduced in 
Lemma 
by 



h-l 



3.2 and discarding higher order terms in l/\/T, the MSFE in (3.4) can be approximated 

p 

i=h j=0 k=0 

-j' 

, (3.5) 



MSFE {X^,}j ^^^Y. + '^'^ 



P P-iP-i-j 

i+j + k.j + P 

.i—h i—h j—0 k—0 

P P-iP-i-j P P-i'P-i'-j' 

j+PJ'+P^i+j+k,i'+j'+k' 
i=h j=Q k=0 i'=hj'=0 k'=0 



where 'is the covariance matrix in (3.2) 



3.2 On Liitkepohl's formula for the total forecasting error 



As we already pointed out, H. Liitkepohl |L65[ pages 97 and 490] has proposed formulas for VARMA 



models similar to the ones presented in Theorem 3.3 based on a different application of the Delta Method. 
In this section, we rederive Liitkepohl's result in the ARMA context and show that it is identical to 



the approximated formula (3.5 1 presented in part (iii) of Theorem 3.3 In passing, this conveys that 
Liitkepohl's result can be explicitly formulated and computed using neither stationarity hypotheses nor 
Monte Carlo simulations. 

The key idea behind Liitkepohl's formula is applying the Delta Method by thinking of the forecast 
X-T+h in question as a differentiable function Xx+hi/S) of the model parameters /3 := ($',0')'. In 
order to develop further this idea, consider first the information sets Tt '■= o'(^t) and J-"^ := o^i^x) 
generated by two independent samples and Ct the same size. The sample is used for forecasting 
and hence Tt determines the forecast Xt+hH^) once the model parameters /3 have been specified. The 
sample is in turn used for model estimation and hence determines (3. Consequently, the random 
variable Xx+hif^) is fully determined by Tt and T^. In this setup, a straightforward application of the 
statement in Lemma |3JJ shows that 



X^h (3) - X^h (/3) I 



T-^oo \ 9/3 ^ 



dX' 



T+h 



d(3 



(3.6) 



which, as presented in the next result is enough to compute the total forecasting error. 



Theorem 3.4 In the same setup as in Theorem 3.3 the total error associated to the forecast in (5, 
can be approximated by 



h-l 



dXx+h y I dXx+h 1 
d/3 ^ d/3 J 

We refer to this expression as Liitkepohl 's formula for the total forecasting error. Moreover: 



MSFE {X^?j = E + 



(3.7) 



(i) Lutkepohl's formula coincides with the approximate expression for the total error stated in (3.5). In 



particular, the second summand in Liitkepohl's formula, which describes the contribution to the 
total error given by the estimation error, can be expressed as: 

' P P P-i P-i-j 

XI (^3^-)^* +2 5m X A^ki^Sp)^+J+k,j+P 
i=h j=0 k=0 



dXT+hy I dXT+h 
df3 ^ [ df3 



_i — h 



P P-iP-i-j P P-i' P~i'-j' 

XX X XX X i'ii^k■^P^'1pk'{^Sp)3+P,J'+pS^+j+k,^'+3'+k' 
i=h j=0 k=0 i'=h j'=0 fe'=0 



(3.8) 
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(ii) // we assume that the samples used for forecasting are second order stationary realizations of the 



model (3.1) and 7 : Z — >■ M zs the corresponding time invariant autocovariance function, then the 



estimation error can be expressed as: 



^ P P-i P P-i' 



i—h j—0 i'—h j'—O 

(3.9) 

4 Finite sample forecasting of temporally aggregated linear 
processes 

The goal of this section is proposing a forecasting scheme for temporal aggregates based on using high 
frequency data for estimation purposes and the corresponding temporally aggregated model and data 
for the forecasting task. We show, using the formulas introduced in the previous section, that in some 
occasions this strategy can yield forecasts of superior quality than those based exclusively on high 
frequency data that are presented in the literature as the best performing option [L86b, L87, L09]. 

We start by recalling general statements about temporal aggregation that we need in the sequel. We 
then proceed by using various extensions of the results in Section [3] regarding the computation of total 
forecasting errors with estimated series in order to compare the performances of the schemes that we 
just indicated. 

4.1 Temporal aggregation of time series 

The linear temporal aggregation of time series requires the use of the elements provided in the following 
definition. 

Definition 4.1 Given K £ N, X a time series, and w — {wi, ...,wk)' G , define the K-period 
projection px of X as 

where xj^' (X^j^j-j^^+i, Xjx) G M^^, and the corresponding temporally aggregated time series 
Y as 

Y ■.= I^opK{X), (4.1) 

where 

The integer K is called the temporal aggregation period and the vector w the temporal aggregation 

vector. Notice that the aggregated time series Y is indexed using the time scale r ~ mK , with m G Z 
and its components are given by the X-aggregates X^j^ defined by 

X^+K ■■= WiXt+l + ... + WKXt+K = Yr. (4.2) 
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Definition 4.1 can be reformulated in terms of the backward shift operator L as: 

A'-l 



i=0 



where 



(X 



(XKj + l) 



(4.3) 



and the indices of the components (Zj)^.g^ of Z := X^jLo "^K-iL^ (^) are uniquely determined by the 
choice Zi :— wiXi + ... + wkXk- 

There are four important particular cases covered by Definition |4.1[ namely; 



(i) Stock aggregation (also called systematic sampling, skip-sampling, point-in-time sampling): it is 
obtained out of (lO) or (lO) by setting w = (0, 0, 0, 1)'. 



(ii) Flow aggregation: w = (1, 1, 1)'. 

(iii) Averaging: w = {1/K,1/K, ...,1/K)' . 

(iv) Weighted averaging: w = — (fi, ...,^k)' such that 

K 



4.2 Multistep approach to the forecasting of Hnear temporal aggregates 

Let X be a time series and w = {wi, ...,wx)' an iiT-period aggregation vector. Given a finite time 



realization 
wiXt+1 + ... 
multistep forecast Xt+i, 
setting X^^p^ :— wiXt+i 
the time series Y given by 



{xi, ...,xt} of X such that T = MK with M £ N, we aim at forecasting the aggregate 
wkXt+k- There are two obvious ways to carry this out; first, we can produce a 
Xt+k for X out of which we can obtain the forecast of the aggregate by 



wkXt+k- Second, we can temporally aggregate X using (4.1) into 



Y = l^opK {X) 



and produce a on e-step forecast for Y. The following result recalls a well known comparison |AW72[ 



and pr ^ ^ ^ ^ 

iLsil lL86b[ IL89 aj of the forecasting performances of the two schemes that we just described using the 
mean square characteristic error as an optimality criterion. In that setup, given an information set 
encoded as a cr-algebra, the optimal forecast is given by the conditional expectation with respect to 
it |Ham941 page 72]. Given a time series X, we will denote in what follows by a {Xt) the information 
set generated by a realization = {a^i, • • ■ , xt} of length T of X and the preset / used to produce it; 
more specifically 

(j{Xt) :^<7{lyj{xi,...,XT}). 



Proposition 4.2 Let X he a time series and w = [wi^ ...^wk)' o, K-period aggregation vector. Let 
K = Iw o pk (X) be the corresponding temporally aggregated time series. Let T = MK with M, T G N 
and consider Tt = c {Xt), J^m — c (^m) the information sets associated to two histories of X and Y 
of length T and M , respectively, related to each other by temporal aggregation. Then: 



MSFE {E [X^+xl J-t] ) < MSFE {E [Fm+iI^a/]) 



(4.4) 
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Remark 4.3 The inequality (4.4 1 has been studied in detail in the VARMA context by H. Liitkepohl jL86b[ 
L87l IL09 who has fully characterized the situations in which the two predictors are identical and hence 



have exactly the same performance. This condition is stated and exploited in Section [5] where we illus- 
trate with specific examples the performance of the forecasting scheme that we present in the following 
pages. 

In the next two results we spell out the characteristic and the total errors associated to a multistep 
approach to the forecast of linear aggregates. The characteristic error is given in Proposition 4.4 and 
the total error is provided in Theorem 4.7 under the same independence hypothesis between the samples 
used for estimation and forecasting that were already invoked in Theorem |3.3| 



Proposition 4.4 Let X be a time series model as in (3.1), r = max{p, g}, K a temporal aggregation 
period, w — {wi, ...,wk)' an aggregation vector, and J-V := cr (X^) the information set generated by 
a realization — {^i, ■ ■ ■ ,Xt} of length T of X . Let X^^j^ be the forecast of X -aggregate X^^^ := 
'l2iLi uiiXx+i based on Tt using the forecasting scheme in Proposition 



2.1 



Then: 



(i) The forecast X^.j^ is given by 



K 



T+i-l+r 



X 



T+K 



^^^^ X! ^J^^ 



(4.5) 



(ii) The corresponding mean square forecasting characteristic error is: 

K i-l K-i K 



MSFE (x^) = E {x:^+K - X. 



i=l 1=0 i=l j=i+l 1=0 

(4.6) 

Exa mple 4.5 Forecast of stock temporal aggregates. It is a particular case of the statement in Propo- 



sition 



4.4 



obtained by taking w — (0, 0, 1 



X. 



T+K 



In this case 

T+K-l+r 

Xt+K — ^ '4^j£T+K- 
3=K 



This shows that the forecast of the stock temporal aggregate coincides with the if-multistep forecast of 
the original time series. Consequently, it is easy to see by using (4.6) and (2.61 that 

MSFE (x|^) = MSFE (x^) . 

Ex amp le 4.6 Forecast of flow temporal aggregates. It is a particular case of the statement in Proposi- 
tion 



4.4 



obtained by taking w = (1, 



1)'. In this case 

K T+i-l+r 



X 



T+K 



J2 ^3^T+^-J■ 



Consequently, 



MSFE (x^,) 



K-l K-1 K i-1 

j=0 1=1 j=i+l 1=0 



K-1 



K-1 K-l 



E - j) v-' + 2 5^ 5] (ic - j) 



Finite sample forecasting with estimated temporally aggregated linear processes 



14 



Theorem 4.7 (Multistep forecasting of linear temporal aggregates) Consider a sample = 
{xi, . . . ,xt} obtained as the realization of a model of the type (3.1) using the preset I. In order to 
forecast out of this sample, we first estimate the parameters of the model '5' = ^rpQ,rpi, • • ■ | o,iT-d H — 
{ttqjTTi, . . . } based on another sample of the same size that we assume to be independent of ^t- Let 
w = (wi, . . . ,Wk)' be an aggregation vector and let X^_^j^ be the forecast of the aggregate X^^j^ :— 

^^^i WhXT+h based on Tt '■= ct (/ U ^t) using Proposition 4-4 o,nd the estimated parameters 4*, 11. 
Then: 



(i) The optimal forecast X^^j^ for X^^j^ given the samples and ^'rp is 

K T+h-l+r 



h=l j=h 



-J' 



(4.7) 



where r — max{p, q\ and it ~ X]j=o ^ ^j^t~j ■ 
(ii) The mean square forecasting error associated to this forecast is 

MSFE (x^,^ = <vf,{A + B + C)w >, 

where A, B, C are the matrices with components given by 

P{h)P{h') 

Ahh' = X! X! '^ii'^'^h-iM'-i', 
i=0 i'=0 



P{h) P(h') P(h)-i P(h)-i-j 
i=0 i=h' j=0 fe=0 



'^h — l.h'—i—j — k: 



(4.8) 



(4.9) 



(4.10) 



P(h) P(h)-i P(h)-i-j P{h') P(h')-i' P(h')-i'-j' 
i=h 3=0 fc=0 i' = h' j'=0 k'=0 



Sh- 



i—j — k^h' — i'—j' — k'-i 



with P{h) = T + h-l + r, P{h') ^T + h' -l + r. Notice that Ahh' = Afj^f'' + A",^^, , where 
h-ih'-i P{h)P{h') 

2^0 i'—O i—h i'—h' 



(4.11) 



and < w, A w > is the characteristic forecasting error in part (ii) of Proposition 4-4 

and 



(iii) Let Sp := {^i, . . . , ^p{k )i tti, . . . , 'Kp(k)) , with P{K) = T + K — l + r. Using the notation 
introduced in Lemmas 



3.1 



3.2 



and discarding higher order terms in XjyT , the MSFE in (4. 



can be approximated by 



MSFE (x^) = ^ w, (A"''°'' + D + F + G)w>, (4.12) 



where 



h-l h'-l 

2=0 i'=^ 



i.h' —i' 
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^ p(h)p(h') 

Dhh' = 7^ X! X! i^Sp)i,i' ^h-iM'~i'^ 



G 



hh' 



T 

i — h i' — h' 

2 Pih) P{h') P{h')-i' P(h')-i'-j' 

= ^ X] X] X] X! 'ipi'll^k' {^Ep).,^p(K)+j' ^h-iM'-i'-j'-k', 

i=h i'=h' j'=0 k'=0 

^ P(h) P(h)-i P(h)-i-j P{h') P(h')-i' P(h')-i'-j' 

H H H H ^i^k'>P^''>Pk'{'>^=.p)J+p^K),r+P{K)^h-i-j-k,h'-i'-3'-k' 

i=h j=0 k=0 i'=h' j'=0 k'=0 



Remark 4.8 In order to compute the total error in (4.121 it is necessary to determine the covariance 



matrix ^Sp- By Lemma 3.2 it can be obtained out of the covariance T,p matrix associated to the 
estimator of the ARMA parameters combined with the Jacobian J^p ■ Details on how to algorithmically 
compute this Jacobian are provided in Appendix |7.2[ 



Remark 4.9 Notice that all the matrices involved in the statement of Theorem 4.7 are symmetric 
except for B and F. 

4.3 A hybrid forecasting scheme using aggregated time series models 

In the previous subsection we presented a forecasting method for linear temporal aggregates based 
exclusively on the use of hi gh frequenc y data and models. The performance of this approach has been 
compared in the literature |L86b[ IL87 with the scheme that consists of using models estimated using 



the aggregated low frequency data; as it could be expected due to the resulting smaller sample size, 
this method yields a performance that is strictly inferior to the one based on working in the pure high 
frequency setup. 

In this section, wc introduce and compute the performance of a hybrid recipe that consists of esti- 
mating first the model using the high frequency data so that we can take advantage of larger sample 
sizes and of the possibility of updating the model as new high frequency data become available. This 
model and the data used to estimate it are subsequently aggregated and used for forecasting. We will 
refer to this approach as the hybrid forecasting scheme. The main goal in the following pages is 
writing down explicitly the total MSFE associated to this forecasting strategy so that we can compare 
it using Theorem |4.7| with the one obtained with the method based exclusively on the use of high fre- 
quency data and models. In the next section we use the resulting formulas in order to prove that there 
are situations in which the hybrid forecasting scheme provides optimal performance for various kinds of 
temporal aggregation. 

The main tool at the time of computing the MSFE associated to the hybrid scheme is again the use 
of the Delta Method [Ser80| in order to establish the asymptotic normality of the estimation scheme 
resulting from the combination of high frequency data with the subsequent model temporal aggregation. 

In order to make this more explicit, consider a time series model X determined by the parameters 
/3x for which an asymptotically normal estimator f3x is available, that is, there exists a covariance 
matrix S/3_x: such that 

Vf - (3x) N (O^J^f^,) 

with T being the sample size on which the estimation is based. Now, let if S N be an aggregation 
period, w G M.^ an aggregation vector, and Y := 1^ opx {X) the linear temporally aggregated process 
corresponding to X and w. 
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Proposition 4.10 In the setup that we just described, suppose that the temporally aggregated process 
Y is also a parametric time series model and that the parameters (3y that define it can be expressed 



as a function (3y (/3x) of the parameters px that determine X. Using the estimator f3x 
construct an estimator (3y for /3y based on disaggregated X samples by setting I3y '■= (^y (/^-X^) ■ 



we can 
en 



Vf (0Y - f^Y) N iO,j:i3^) , (4.13) 

\ / T— >-oo 

where T is the disaggregated sample size and 

^I3v ^ JpY^PxJpyr, (4-14) 

with {Jpy):; = S^n^\^ the Jacobian matrix corresponding to the function I3y {fix)- 
^ o[Px)j 

Once the model temporal aggregation function and its Jacobian have been determined, this proposi- 
tion can be used to formulate an analog of Theorem |4 . 7| for the hybrid forecasting scheme by mimicking 
the proof of Theorem |3.3[ the only necessary modification consists of replacing the asymptotic covari- 
ance matrix of the estimator for the disaggregated model by that of the aggregated model S/g^ 
obtained using Proposition |4.10[ 

We make this statement explicit in the following theorem and then describe how to compute the 
model aggregation function f3Y{f3x) and its Jacobian Jp^ in order to make it fully functional. The 
construction of these objects is carried out in the ARMA context where the model aggregation question 
has already been fully studied. Even though all necessary details will be provided later on in the section, 
all we need to know at this stage in order to state the theorem is that the linear temporal aggregation 
of an ARMA(p,q) model is another ARMA(p, q*) model where 



K{p+l)+q-p^K* 
K 



(4.15) 



K is the temporal aggregation period, K* is the index of the first nonzero entry of the aggregation vector, 
and the symbol [-J denotes the integer part of its argument. We emphasize that if the innovations that 
drive the disaggregated model are independent with variance u^, this is not necessarily the case for the 
resulting aggregated model, whose innovations may be only uncorrelated with a different variance crj, 
making it into a so called weak ARMA model. 



Theorem 4.11 (Hybrid forecasting of linear temporal aggregates) Let = {xi, ■ ■ ■ , xt} be 



a 



sample obtained as a realization of a causal and invertible ARMA(p,q) model X as in (3.1). Let w — 
{wi, . . . ,wk)' be a temporal aggregation vector such that T — MK, for some M G N, and let Y — 
Iw °Pk (X) be the temporal aggregation of the model X and rjM := {yi, . . . , the temporal aggregated 
sample obtained out of S^t- 

We forecast the value of the temporal aggregate Ym+i — ^t+k sample 7]m by first estimat- 

ing the parameters (3x of the model X using another disaggregated sample of the same size, that we 
assume to be independent of ■ Let (By {Px ) be the function that relates the ARMA parameter values 
of the disaggregated and the aggregated model and let Jf^^ be its Jacobian. Consider the ARMA(p, q*) 



model, with q* as in (J^.ld), determined by the parameters (3y ■= /3v(/3x)- Then: 



(i) The optimal forecast Ym+i of the temporal aggregate Ym+i = ^t+k based on the information set 



J-M '■= [Ik U tim) using Proposition 4-4 '^^'^ the estimated parameters (3y is given by 



T+r' 

Ym+1 = ^ i^jET+h-j, (4.16) 
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where Ik is the preset obtained out of the temporal aggregation of I, r* — max{p,q*}, and 
it '■— X^jio ^ ^jVt-j' with ^ — l-^O) "011 • • • I o'^'^ n = {ttq, TTi, . . . } the parameters correspond- 
ing to the causal and invertible representations of the temporally aggregated ARMA model with 
parameters (5y . 

(ii) Let Hp := {ij^i, . . . , ij^p, tti, . . . , up)' with P = T + r* . Discarding higher order terms in 1/Vt, the 
MSFE corresponding to the forecast (4.161 can be approximated by 



MSFE ( Y, 



AI+l 



■ P P P-iP-i-j 

i—h i—h j—0 k—0 



P P-iP-i-j P P-i' P-i'-j' 

X! X! X! '^i'^k^i'^k'{^=.p)] + P.j' + p5i+j+k,t'+j' + k' 

1=1 j=0 fe=0 i' = l j'=0 k'=0 



(4.17) 



where a1 is the variance of the innovations of the aggregated ARMA(p, q*) model Y and 



the covariance matrix given by 



Jsp^PY-^kf 



Jsp Jf3Y ^/3x JpY "^Hf 



with JjSy the Jacobian matrix corresponding to the function (3y (fix) md Jsp the Jacobian of 
Hp(/3) := (V'i(^y), . . . , V'p(^y),^i(^v), . . . , vrp(/3y))'. 

As we announced above, we conclude this section by describing in detail the parameters aggregation 
function I3y{I3x) and its Jacobian J/a^, so that all the ingredients necessary to apply formula (4.17) 
are available to the reader. In order to provide explicit expressions regarding these two elements, we 
provide a brief review containing strictly the results on the temporal aggregation of ARMA processes 
that are necessary for our discussion; for more ample discussions about this topic we refer the reader 
to iAW721 lTm72l IBre73[ ITW761 IWei791 ISW861 IWeiOBl ISVOS] and references therein. 

The temporal aggregation function /9v(/3x)- Consider the ARMA(p,q) model $ (L) X = & (L) e, 

where 4> (L) = 1 — X^iLi 'f'i-^^ ^^"^ ® (-^) = 1 + S?=i ^i-^'- In order to simplify the discussion and to 
avoid hidden periodicity phenomena, we place ourselves in a generic situation in which the autoregressive 
and moving average polynomials of the model that we want to temporally aggregate have no common 
roots and all roots are different (see [Wei06 for the general case) . Consider now a i^-period aggregation 
vector w = (wi, wk)' ■ Our first goal is to find polynomials T (z) and (z) that satisfy 



T (L) * (L) = ** (L^) o Uk 



K 

i=l 



(4.18) 



ith n 



K 



n 



n 



as in (4.3). The intuition behind (4.18) is that for any time series X, 



its temporal aggregation Y — Hx o J2i=i '^i^^ {^) satisfies 

t(l)*(l)a: = ** {L^)Y. 



(4.19) 



In other words, the polynomial T (L), that we will call the temporal aggregation polynomial trans- 
forms the AR polynomial for X into an AR polynomial for Y in the aggregated time scale units. Let 



T(i) =io+iii + --+ini" and ** (i 



1 



K 



Kc 
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be the unknown polynomials. Equation (4 



/ t^ 



u 


n 
u 


n 

u 


h 


to 










h 


to • 





tp 




tp-2 


to 




tn-l 


tn-2 


tji—p 





in 


tn-1 


tn—p—1 








t-n 


tji—p—2 


V 











18) can be written in matrix form as |Bre73) : 



/ w \ 



/ 1 \ 




(4.20) 



where w — [wktWk-i, ■■■iWi)' is the reflection of w. We start by determining the unknown values n 
and c using the two following dimensional restrictions: 

• Since yl is a matrix of size (n + p + 1) x (p + 1), Z and D are vectors of size p + 1 and cK + 
respectively, and we have AZ = D then, necessarily 



n + p+l^cK + K. 



(4.21) 



• The system AZ = D contains n + p + 1 equations that need to coincide with the number of 
unknowns, that is, the n + 1 + c coefficients (to, ti, . . . , t„) and {(f>l, . . . , (f>*) of the polynomials 
T (z) and ^*{z), respectively. Consequently, 



n + p + 1 



c+1. 



(4.22) 



The conditions (4.21) and (4.221 yield 



c = 

n =pK + K -p-l = {p+l){K-l) 



(4.23) 



The ffi'st condition in (4.231 shows that the autoregressive order does not change under temporal aggre- 
gation. 

Let now K* < K he the index of the first nonzero component in the vector w. This implies that 
w is a vector of the form w = {wk, wk-i, wk' ,0, 0)' with wk^ wk-i, ■ ■ ■ , wk' 0. Since (4.20 ) 



is a matrix representation of the polynomial equality in (4.18), we hence have that deg (T (L) $ (L)) = 
deg(D(_L)), where D (L) is the polynomial associated to the vector in the right hand side of (4.20). 
It is clear that deg (D (L)) K {p + I) - K* ; a.s deg (T (L) * (L)) ^ n + p, the degree n of T (L) is 
therefore 

n^K{p + l)-p-K*. (4.24) 



Solving the polynomial equalities (4.20), we have found polynomials T {L) and (L^) such that 
the temporally aggregated time series Y satisfies 



(L^') y = T (L) * (L) X 



(4.25) 
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Our goal now is showing the existence of a polynomial 0* (L^) and a white noise {e*} ^ WN (O, cr^) 
such that 

T (i) e (L) eiK = 0* {L^) E/V for any I e Z. (4.26) 

This equality, together with ( |4.25 1 shows that the temporally aggregated process Y out of the ARMA 
process X is a weak ARMA process, as it satisfies the relation 



^* {B)Y = @* {B)e* , {e*}-WN(0,CT^), 



where B = , (B) is a polynomial of degree p, and ©* {B) a polynomial of degree 
coefficients will be determined in the following paragraphs. We recall that the symbol 



n + q 



(4.27) 
whose 



K 

•J denotes the 

integer part of its argument. Indeed, by (4.24) we have that deg (T (L) (L)) = K {p + l)—p—K* +q = 
n + q. Additionally, by (|4.25p 



T (L) (L) e = T (i) ^{L)X = ** {l'^) Y. (4.28) 

Since the right hand side of this expression only involves time steps that are integer multiples of K, the 
relation (4.301 only imposes requirements on the left hand side at those time steps. Moreover, it is easy 



to see that 



for any Kj > K {p + 1) 



E [(T {L) (L) £ik) (T (i) (L) Eik+^k)] = 0, (4.29) 
p — K* . This implies that the process is {T {L) & (L) eik} 



{K {p + 1) + q — p — K*)-coTre\ated, which guarantees in turn by |Bro06[ Section 3.2] the existence 
of a weak MA{q*) representation 



where 



T{L)&{L)eiK = ®* {B)e:K, lei 
K{p+l)+q-p- K* 



deg(0* iB)) = 



K 



■= q 



(4.30) 



(4.31) 



The coefhcients of the polynomial 0* (B) are obtained by equating the autocovariance functions of the 
processes on both sides of (4.30 1 at lags 0, K, 2K, ...,q*K, which provide q* + 1 nonlinear equations that 
determine uniquely the q* + 1 unknowns corresponding to the coefficients 61, ... , 6*, of the polynomial 
0* (B) and the variance crj of the white noise of the aggregated model. 

In order to explicitly write down the equations that we just described, let us denote C (L) :— 
T (L) (L) as in (4.30) and set C (i) — J2'i=o Cj-^'- Let now 7 and F be the autocovariance functions 
of the MA (n + q) and MA (q*) processes 



Vt = C (L) St and 11^ = 0* (B) e* , respectively. 



which are given by 



1=0 



and r(j)=(72^ 



(4.32) 



(4.33) 



Consequently, the coefficients of the polynomial 0* {B) and the variance crj are uniquely determined 
by the q* + I equations 



n+q-i q'~j 
1=0 1=0 



(4.34) 
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The equations (4.34) can be written down in matrix form, which is convenient later on at the time of 
spehing out the Jacobian of the aggregation function. Indeed, we can write: 

7 it) = al [t (L)0(L)'5,T(L)0(L)" = 



C (L) S,C (L) 



(4.35) 



(4.36) 



where the bars over the polynomials in the previous expressions denote the corresponding coefficient 
vectors, that is, given a polynomial q (x) — X]r=i '^i^N then q (x) = (ao, ai, . . . , a„)'. Additionally Si is 
the lower ith-shift matrix, that is. 

For any given vector v = {vi, . . . , w„)', SiV = (0, . . . , 0, wi, . . . , Vn^i)' ■ With this notation, the equations 



(4.34) can be rewritten as 

2 



T (L) & (L) SjkT [L) (L) 



0* {B) Sj&* [B) 



J -0,1, 



(4.37) 



In conclusion, if we denote (3x = (^, 0) and jSy = ($*, 0*), the construction that we just examined 
shows that 

/3y(/3x) = (**(*), ©*(*,©)). (4.38) 



The function (#) is given by the solution of the polynomial equalities (4.20) and 0* ($,0) by the 
coefficients (^oi^i, ■ • • ,tn) determined by (4.20) and the solutions of (4.37). 

Example 4.12 Stock temporal aggregation of an ARMA(p,q) model. In this case, w = (0, ...,0, 1)' 
and hence K* = K , n = p {K - 1), and q* ^ P - 1) + 9 



K 

Example 4.13 Flow temporal aggregation of an ARMA(p,q) model. In this case, w = (1, . . . , 1)' and 

{p+l)iK-l) + q 

K 



hence K* ^ I, n = {p + I) {K - 1), q* 



The Jacobian J/Sy of the temporal aggregation function fSy {13 x)- The goal of the following 
paragraphs is the computation of the Jacobian Jp^ of the function (3y (/3x) = (4** (*) i ©* (4*10)) 
in (4.38). We first compute the Jacobian of the function ($) by taking derivatives with respect to 
the components of the vector $ on both sides of the equations ( 4.20[ ) that determine ($). This 
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results in the following p matrix equations 








d(j)i 
dti 


dto 


~Bch 

Ot2 

d(j)i 


3(h 

Oti 

d(j)i 


dtp 


dtp_ 


d(j)i 




dtn 


dtn^ 


d(j)i 



dcpi 

dtn 

d(j)i 













d(pi 
dt 



\ 



dt 



n—p— 1 



dtn- 



dtr. 



( 1 \ 



+ 



V -^P J 



( to 

tl to 
t2 tl 



tp tp—l 

tn 



V 



^n—p— 1 
^n—p—2 



\ 

(Hi 

' d6, 
d<f>2 

d(t>p 
d^, / 



/ \ 

.d<t>l 



with i — I, ...,p. These equations uniquely determine the (1, l)-block 



5** 
~d¥ 



d<t>* 



Jpyr , as well as the derivatives 



1 ^ i,j 



dti 



(4.39) 
of the Jacobian 

(4.40) 



that will be needed later on in the computation of the remaining blocks of the Jacobian. Given that 
there is no dependence on the function ($), the (l,2)-block of the Jacobian is a zero matrix of 
size px q* . 

The remaining two blocks are computed by using the function 0* ($, 0) uniquely determined by 



the equations (4.371. Its derivatives are obtained out of a new set of equations resulting from the 



differentiation of both sides of this relation, namely, 



^^(^)0(L) + T(L)^®(^) 



dWxl 

_ dal 
" d{Px), 
J = 0, 1, . . 



d{Px\ 



S,kT {L) (L) + T (L) (i) SjK 



^T(^)0(L)+T(X)^®(^) 



d{P_ 



X) 



0* [B) Sj@* (B) +, 
,q*, i^l,...,p + q. 



dif3_ 



X 



d{P_ 



X 



dWx\_ 
(4.41) 



We recall that the entries of the vector — — - — correspond to the values previously obtained in (4.40 ) 



9T iV] — 

and that — — - — = 0. Expression (4.41) provides [q* + 1) {p + q) equations that allow us to find the 
aOj 
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values of the (q* + 1) (p + q) unknowns 



dds ' dd)r ' 09 s 



.,q*., r^l,...,p, s^l,...,q. (4.42) 



5 Comparison of forecasting efficiencies. Examples. 

In the previous section we proposed a new hybrid seheme for the forecasting of temporal aggregates 
coming from ARMA processes. We recall that this strategy consists of first using high frequency disag- 
gregated data for estimating a model; then we temporally aggregate both the data and the model, and 
finally we forecast using these two ingredients. As we announced in the introduction, there are situa- 
tions in which our strategy is optimal with respect to the total error, that is, the predictor constructed 
following this procedure performs better than the one based exclusively on high frequency data and the 
underlying disaggregated model. In this section we give a few examples of specific models for which our 
scheme provides optimal efficiency of prediction. Before we proceed, we introduce abbreviations for the 
various predictors that we will be working with: 

(i) Temporally aggregated multistep predictor (TMS predictor): this is the denomination 
that we use for the forecast of the aggregate that is constructed out of the disaggregated data and 
the underlying disaggregated model estimated on them. 

(ii) Temporally aggregated predictor (TA predictor): this is the forecast based on use of the 
temporally aggregated sample and a model estimated on it. 



(iii) Hybrid predictor (H predictor): this is the predictor introduced in Section 4.3 whose perfor- 
mance is spelled out in Theorem |4.11[ In this scheme, a first model is estimated on the disaggre- 
gated high frequency data sample, then the data and the model are temporally aggregated with 
an aggregation period that coincides with the forecasting horizon; finally, both the temporally 
aggregated model and the sample are used to produce a one-step ahead forecast that amounts to 
a prediction of the aggregate we are interested in. 

(iv) Optimal hybrid predictor (OH predictor): this predictor is constructed by taking the mul- 
tistep implementation of the H predictor that yields the smallest total error. More explicitly, 
suppose that the aggregate that we want to forecast involves K time steps; let {Ki, . . . ,Kr} be 
the positive divisors of K and {Ci, . . . ,Cr} the corresponding quotients, that is, K — Kid for 
each i € {!,..., r}. There are aggregation schemes (stock and flow for example) for which a 
if-temporal aggregate can be obtained as the aggregation of Ci iCi-temporal aggregates, for all 
i e {1, . . . , r}. The total error associated to the forecasting of these aggregates using a multistep 
version of the H predictor obviously depends on the factor Ki used. The OH predictor is the one 
associated to the factor Ki that minimizes the total error. 

As we already mentioned, the forecasting performance of the TMS predictor is always superior or 
equal than that of the TA predictor when we take into account only the characteristic error, and it 
is strictly superior when the total error is considered. In view of these results and given that the H 
and the OH predictors carry out the forecasting with temporally aggregated data, they are going to 
produce worse characteristic errors than their TMS counterpart; hence, the only way in which the H 
and OH predictors can be competitive in terms of total error performance is by sufficiently lowering 
the estimation error. In order to check that they indeed do so, we will place ourselves in situations 
that are particularly advantageous in this respect and will choose models for which the TMS and the 
TA predictors have identical characteristic errors and hence it is only the estimation error that makes a 
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difference as to the total error. The hnear models for which this coincidence of characteristic errors takes 



place have been identified in the works of H. Liitkepohl |L86bl lL87l lL09j via the following statement 



Theorem 5.1 (Liitkepohl) Let Xt = "Y^^q ipi£t-i be a linear causal process and letw = (wi, . . . , wk)' G 

he a K-period aggregation vector. Then the TMS and TA predictors for the K -temporal aggregate 
determined by w have identical associated characteristic errors if and only if the following identity holds: 

(j2 WK-^A * (i) = (e ( E ^K-^^jK^A L^A I E [E^'i^-.v^.-. ) lA . (5.1) 

\i=a ) yj=o \i=o / / \j=o \i=o / / 

The equality (5.1 1 is satisfied for both stock (w = (0, . . . , 0, 1)') and flow aggregation (w = (1, . . . , 1)') 



if \Xt\ is a purely seasonal process with period X, that is, 

oo 

Xt = ^Y^%l)^K<£t-iK- (5.2) 

1=0 

Given a specific model we want to compare the performances of the H and the TMS predictors for a 



variety of forecasting horizons. Given that condition (5.11 is different for each aggregation period K and 
cannot be solved simultaneously for several of them, we will content ourselves either with approximate 
solutions that are likely to produce very close H and TMS characteristic errors for several periods K 
or with exact solutions that provide exactly equal errors for only a prescribed aggregation period. The 
following points describe how we have constructed examples following the lines that we just indicated: 

• We first choose the orders p and q of the disaggregated ARMA(p,q) model that we want to use as 
the basis for the example. 



• We fix an aggregation period K and a number n of parameters V'i for which the equation (5.1) 
will be solved. The choice of p and q imposes a minimal number n„ii„ = q — p ^ \. 

• We determine a vector 'U'* = (tAo,^!; V'n-i)' that consists of the n first components of the set 



* = {"00, V'li ■••} that satisfies condition (5.1). We emphasize that in general this condition does 
not determine uniquely the vector ^* and that arbitrary choices need to be made. The vector 
'S'* is a truncation at order n — 1 of the MA representation of the ARM A process that we want 
to construct. 

We conclude the construction of the ARMA(p,q) model that we are after by designing either 
an AR(p) polynomial $ consistent with causality or a MA(q) polynomial consistent with 
invertibility. Then: 

— In the first case, the required model is given by 

*(L)A: = 0*(L)e, with ©* = **•*. (5.3) 

— In the second case, the required model is given by 

**(L)A: = 0(i)e, with ** = (**)-! -0. (5.4) 

In both cases, the MA and AR polynomials that are obtained in this way have to be checked 
regarding invertibility and causality, respectively. Additionally, the finite truncation of ^ is likely 



to give rise to common roots between the AR and MA polynomials in (5.3) or in (5.4) which may 



make necessary a slight perturbation of the coefficients in order to be avoided. 



We emphasize that the resulting ARM A model satisfies (5.1) only approximately and hence the 
characteristic errors of the two predictors will be not identical but just close to each other for the 
specific aggregation period K used. For pure MA models no truncation is necessary and hence 
exact equality can be achieved. 



Finite sample forecasting with estimated temporally aggregated linear processes 



24 



5.1 Stock aggregation examples 



In the particular case of stock temporal aggregation, condition (5.1) is written as: 

(oo \ / K-l 

j=o J \j=o 



(5.5) 



We now consider the truncated vector 4^* with 72 components, that is, yp* = {ipQ, . . . , ipn-i)' ■ Then, the 
truncated version of (5.5) is: 



ra-l 



' K-l 



[_(n-K+l)/K\ 
J=0 V i=0 / Vj=0 



(5.6) 



where the symbol [-J denotes the integer part of its argument. We now provide a few examples of 
models whose specification is obtained following the approach proposed in the previous subsection and 



the relation (5.6) 



Example 5.2 MA(IO) model. 

Let p — Q^ q — IQ, n — rimm — 11 and let K — 2. Equation (5.6 1 becomes 



10 



which imposes the following relations: 



i = 0, 



,5; ipi — for i > n. 



This system of nonlinear equations has many solutions. We choose one of them by setting ijjj = 0, for 
j = 1, . . . , 9, and tpio = 0.3. This way we can trivially determine a MA(IO) model which satisfies exactly 
the relation ( [5.5^ by taking 6'j = for = 1, . . . , 9 and 6iq = 0.3. 

Figure [T] shows the values of the characteristic errors for different values of the forecasting horizon 
for the TMS predictor, the H predictor, and the OH predictor. For the horizon h — 2, the values of the 
characteristic errors of all the predictors coincide, which is a consequence of the fact that the model has 
been constructed using the relation (5.5) with K = 2. Moreover, it is easy to see by looking at (5.2), 



that the particular choice of MA coefficients that we have adopted ensures that the resulting model is 



seasonal for the periods 2, 5, and 10; this guarantees that (5.5 1 is also satisfied for the corresponding 



values of K and hence there is coincidence for the characteristic errors at those horizons too. 

The total errors for a sample size of T = 50 are then computed using the formulas presented in 
sections [3] and |4j The corresponding results are also plotted in Figure [T] for the different forecasting 
schemes. This plot shows that for several forecasting horizons both the H and the OH predictors perform 
better than the TMS predictor. 

A quick inspection of this plot reveals another interesting phenomenon consisting on the decrease of 
the total error associated to the three predictors as the forecasting horizon increases; this feature is due 
to the decrease of the estimation error using these forecasting schemes as the horizon becomes longer. 
The characteristic errors for the H and OH predictors do not increase monotonically with the forecasting 
horizon either; however, in this case, this is due to the fact that for each value of the forecasting horizon, 
these predictors are constructed using a different model since the aggregation period changes and hence 
so does the aggregated model used for forecasting. 
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In conclusion, in this particular example, both the H and the OH predictors exhibit a better fore- 
casting performance than the TMS predictor and, additionally, the results regarding the OH predictor 
help in making a decision on what is the best possible aggregation period to work with in order to 
minimize the associated total forecasting error. 




1 23456789 10 

forecasting horizon 

Figure 1: Characteristic and total errors associated to the forecast of the temporal stock aggregate of the MA(IO) model. 
The sample size used for estimation is T = 50. The innovations of the model have variance = 5. 



Example 5.3 ARMA(3,11) model. 



Let p = 3, q — 11, n — 10 and let K = 3. In this case, the relation (5.6) yields 



J=0 \j=0 J \j=0 

and consequently 

■00 = 1, V'j^si = ip3i+i, i = 0,...,2, j = 1,2, -(/ij = for i > n - 1. 

We choose a solution for these relations of the form ** = (1,-0.9,0.8, 0,0,0,-0.7, 0.63,-0.56,0)'. 
We now introduce an AR(3) polynomial of the form ^> = (—0.9,0.8,-0.4)'. We then determine 
the MA(ll) part of the model by using ( [53| , which yields the coefficients © = (-1.8,2.41,-1.84, 
1,-0.32,-0.7,1.26, -1.687,1.288,-0.7,0.224)'. In order to avoid the common roots between the AR 
and the MA polynomials that are obtained when the coefficients of the MA part are derived in this 
manner, we slightly perturb the values of some of the components of the vector that we now set to be 
= (-1.8,2.4102,-1.8403,1,-0.32, -0.7,1.26,-1.687, 1.288,-0.7,0.224)'. Figure [2] shows the errors 
with respect to the forecasting horizon for all the predictors as in the previous example. The H and the 
OH predictors perform better than the TMS predictor for h = 3,6, 9, 10. Additionally the OH predictor 
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performs better than the H predictor for h — 4. Taking into account the initial choice of K = 3 when 
constructing the example, it becomes clear why the characteristic errors associated to the H and the 
OH predictors are very close to those associated to the TMS predictor for horizons h that are multiples 
of 3. 




forecasting horizon 

Figure 2: Characteristic and total errors associated to the forecast of the temporal stock aggregate of the ARMA(3,11) 
model. The sample size used for estimation is T = 50. The innovations of the model have variance ct^ = 5. 



Example 5.4 ARMA(1,4) model. 



Let p = 1, g = 4, n = 5 and let K = 4. In this setup, relation (5.6) yields 



3=0 j=0 

and consequently i/jq = 1 and 1^4 = 0, necessarily, while the values of the coefficients ■01 j '02, and 
■03 are not subjected to any constraint. We hence set ^* — (1,0.3,-0.3,0.3,0)'. We now introduce 
the AR(1) polynomial determined by the coefficient # = 0.8. We then determine the MA(4) part 



of the model by using (5.3) which yields = (—0.5, —0.54, 0.54, —0.24)'. Again in order to avoid 
common roots between the AR and the MA polynomials, we perturb the polynomial by setting: 
= (-0.5,-0.5403,0.54,-0.24)'. 

Figure [3] shows that the H and the OH predictors have equal associated total errors and exhibit a 
better forecasting efficiency than the TMS predictor for all forecasting horizons except at /i = 2. The 
initial choice of A' = 4 at the model construction stage results in the fact that for h — 4 the values of 
the characteristic errors associated to the H and the OH predictors are very close to the one committed 
by the TMS predictor. 
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Characl eristic error: TMS Predictor 
-■ — Ctiarad eristic error: H Predictor 

Ctiaract eristic error: OH Predictor 
-n— Total error: TMS Predictor 
■•—Total error: H Predictor 

Total error: OH Predictor 



forecasting horizon 

Figure 3: Characteristic and total errors associated to the forecast of the temporal stock aggregate of the ARMA(1,4) 
model. The sample size used for estimation is T = 50. The innovations of the model have variance cr^ = 5. 



5.2 Flow aggregation examples 



In the particular case of flow temporal aggregation, condition (5.1) can be written as: 

K-i I oo K~\ \ (k-\ j 

* (^) E = E E v^.--^ E E 

i=o \j=o i=a I \j=a i=o 



(5.7) 



We now consider the truncation '4'*of with n components, that is, 'if * — {ipo, . . . , V'n-i)- Then, the 
truncated version of (5.7) can be expressed as: 



A'-l 



n-l y(n-K+l)/K\ K-1 

E ^1 E ^.^^ = E E ^.--^ 1 1 E E v^.-^ I ' 

, i=0 / j=0 \ j=0 i=0 / \ i=0 1=0 



(5.8) 



where symbol [-J denotes the integer part of its argument. We now provide a few examples of models 
whose specification is obtained following the approach described in the beginning of the section and the 



relation (5.8). 



Example 5.5 MA(IO) model. 



Let p — q — \Q, n ~ n,„i„ ~ 11, and let K = 2. Then the expression (5.8) reads 



E ^1 E ^.L^ - I E E ^^.-^ I ( E E v^.-^ I ' 



Finite sample forecasting with estimated temporally aggregated linear processes 



28 



and consequently 

-00 = 1, (l + ■)pi)iipi + ipi+i) = ^i+i + 'ipi+2, for 1 = 1,..., 9, and V-'i = for « > n. (5.9) 

A solution for these equations is given by the choice ipj — 0, for j — 1, ... ,9, and ■(/'lo ~ 0.3. Since 
the order of the AR polynomial is zero, the method that we proposed determines uniquely in this case 
the MA(IO) polynomial that we are after with = (0,0,0,0,0,0,0,0,0,0.3)'. The evolution of the 
forecasting errors versus the forecasting horizon in plotted in the Figure m Both the H and the OH 
predictors perform better than the TMS predictor. For h = A the OH predictor has the smallest total 
error among the three predictors. 




12345B78B10 



forecasting horizon 

Figure 4: Characteristic and total errors associated to the forecast of the temporal flow aggregate of the MA(IO) model. 
The sample size used for estimation is T = 50. The innovations of the model have variance = 5. 



Example 5.6 ARMA(3,10) model. 



In the previous example we chose tpio ^ 0. Let us now use another solution of the system (5.9) 
in order to obtain another model with target orders p = 3 and q = 10. If we set ^plo = 0, then 
a possible solution is = (1,-0.5,0.45,-0.475,0.3,-0.3875,0.1,-0.2438,0,0,0)'. Let now * = 



(0.21,0.207,0.0162)' be a causal AR(3) polynomial which determines via ( |5.3[ ) the MA(IO) polyno- 
mial © = (-0.71,0.348,-0.4822,0.3147, -0.3595,0.1270,-0.1894, 0.0368,0.0488,0.0039)'. In order 
to avoid common roots for the AR and MA polynomials, we perturb the MA coefficients and set 
= (-0.71,0.3481,-0.4823,0.3148, -0.3595,0.1270, -0.1894,0.0368,0.0488,0.0039)'. Figure [5] shows 
the corresponding characteristic and total errors for the three predictors. The H and the OH predictors 
exhibit better performance than the TMS predictor for horizons ft- = 2, 4, 5, 6, 7. 
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—^-Characteristic error: TlylS Predictor 
— ■ — Ctiaracteristic error: hi Predictor 
^t^-Ctiaracteristic error: OhI Predictor 
— n — Total error: Tt^S Predictoi 
—•—Total error: H Predictor 
— • — Total error: OH Predictor 



forecasting horizon 

Figure 5: Characteristic and total errors associated to the forecast of the temporal flow aggregate of the ARMA(3,10) 
model. The sample size used for estimation is T = 50. The innovations of the model have variance o"^ = 5. 



Remark 5.7 The model that we just presented can be used to ihustrate the fact that the construction 
method that we presented in this sections is not the unique source of examples in which the H and the 
OH predictors perform better than the TMS scheme. Indeed, as it can be seen in Figure [6) the very 
same ARMA(3,10) model prescription used in the context of stock aggregation also shows this feature 



even though it has not been obtained by finding a solution of the equation (5.51 



Remark 5.8 Notice that when the forecasting horizon h equals one all predictors coincide because 
there is no temporal aggregation and hence they obviously have the same errors associated. 



6 Conclusions 

In this work we have carried out a detailed study of the total error committed when forecasting with 
one dimensional linear models by minimizing the mean square error. We have introduced a new hy- 
brid scheme for the forecasting of linear temporal aggregates that in some situations shows optimal 
performance in comparison with other prediction strategies proposed in the literature. 

We work in a finite sample context. More specifically, the forecasting is based on the information 
set generated by a sample and a model whose parameters have been estimated on it and we avoid the 
use of second order stationarity hypotheses or the use of time independent autocovariance functions. 

In this setup, we provide explicit expressions for the forecasting error that incorporate both the error 
incurred in due to the stochastic nature of the model (we call it characteristic error) as well as the one 
associated to the sample based estimation of the model parameters (estimation error) . In order to derive 
these expressions we use certain independence and asymptotic normality hypotheses that are customary 
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in the literature; our main contribution consists of providing expressions for the total error that do not 
require neither stationarity on the samples used nor Monte Carlo simulations to be evaluated. 

We subsequently use these formulas to evaluate the performance of a new forecasting strategy that we 
propose for the prediction of linear temporal aggregates; we call it hybrid scheme. This approach consists 
of using high frequency data for estimation purposes and the corresponding temporally aggregated data 
and model for forecasting. This scheme uses all the information available at the time of estimation 
by using the bigger sample size provided by the disaggregated data, and allows these parameters to 
be updated as new high frequency data become available. More importantly, as we illustrate with 
various examples, in some situations the total error committed using this scheme is smaller than the 
one associated to the forecast based on the disaggregated data and model; in those cases our strategy 
becomes optimal. As the increase in performance obtained with our method comes from minimizing 
the estimation error, wc are persuaded that this approach to forecasting may prove very relevant in the 
multivariate setup where in many cases the estimation error is the main source of error. 



7 Appendices 



7.1 Proof of Proposition 2.1 



(i) It is a straightforward consequence of the causality and invertibility hypotheses on the ARMA 
model that we are dealing with. Indeed, we can write 



t-l+r 
J=0 



and Xf 



t-l+r 
1=0 



(7.1) 
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which proves (2.4 1. 



(ii) Suppose first that the innovations {st} are IID(0, cr^). Then the forecast Xj^+h that minimizes 

21 



the mean square forecasting error E 
(see for example |Ham94) . page 72): 



is given by the conditional expectation 



T+h-l+r 



Xr+h = E [Xr+hW (Cr)] = E [Xt+hW (er)] = J2 [£T+h-^W (ir)] 

2=0 

T+h-l+r T-l+r T-l+rT-i-l+r 

= X] '^i^T+h-t= ^ i'l+h^T-i^ ^ ^ ^t+hT^jXr-i-j, 



i=h i=0 i=0 j=0 

as required. 

When {et} is WN(0, cr^) our goal is finding the linear combination X]J=o^^' ^j-^T-j that minimizes 

V 21 



E 



/ T-l+r N 

Xt+k — ^ UiXx-i 

V i=0 / 



Given that by (7.1), the elements X^-i can be written as a linear combination of the elements in 
ET-i, this task is equivalent to finding the vector b — {bo, bx-i+r)' that minimizes the function 

f'T+h-l+r 



S {bo, ...,bT-l+r) = E 



/ T-l+r 
Xt+Ii — ^ bi^T- 



i=0 



E 



T-l+r 



E 



^h-1 



T-l+r 



^ tpiST+h-i + ^ (V'i+Zi - h) £t- 



i=0 



X! ^i^T+h-i - ^ biST- 
\ 1=0 i=0 

h-1 T-l+r 



.1=0 



i=0 



Hence, in order to minimize the function S {bo, ■■■,bT-i+r) we compute the partial derivatives 
dS/dbi and we set them to zero, which shows that the optimal values are attained when bi = ipi+h- 
Consequently, the optimal linear forecast is given by Xt+h = Y^=o'^^ i^i+h^T-i, as required. 



(iii) We first compute Xt+h — Xt+h- By (2.5) and (7.1) we have 



h-l 



Xr+h — Xx+h — "ipiST+h- 



Therefore 



MSFE ( X' 



T+h 



j=0 

ni-1 \ 

i/JiST+h-i 

\i=0 / 



h-l 



1=0 



(iv) Given the model $ (L) X = @ (L) e, we have 

Xr+h ~ 4>lXT+h-l + ■■■ + 4>pXT+h-p = £T+h + dl£T+h-l + ■■■ + Qq^T+h-q- 



We first recall that by (2.4 1 we have that a {^t) — f (^t) J't- We now project both sides of 
this equality onto the information set J-t by thinking of this cr-algebra as a {£,t) for the left hand 
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side projection and as a (e^) for the right hand side. We obtain: 

Xr+h — (pl^T+h-l + ■■■ + 4>p^T+h-p = E [er+h + ^i£t+/i-1 + ••• + (^q£T+h-q\^T)] 

OhET + ■■■ + OqET+h-q, q>h 

0, otherwise. 

In the presence of white noise innovations, the conditional expectation in the previous equahty 
should be replaced by a linear projection. ■ 

7.2 Computation of the Jacobian J^p 

In this section we provide a simple algorithmic construction for the computation of the Jacobian Jsp 

when the elements in Hp are considered as a function of f3, that is, Sp{(3) := (-01 (/3), . . . , ipp{f3), 7ri(/3), . . . , 7rp(/3)). 

dtp dir 

We will separately compute the components — — - and 777—, i — 1, . . . , P, k — 1, . . . ,p + q. 

The causality and invertibility hypotheses on the ARMA process we are working with, guarantee 
that for any P > max{p, q} there exist polynomials '4'p(z), Tlp(z) of order P uniquely determined by 
the relations: 

*(z)*p(z) = 0(z), (7.2) 
*(z)=np(z)0(z), (7.3) 

which are equivalent to 

*p(z) = *-i(z)©(z), 
np(z) = #(z)0-i(z). 

These polynomial relations determine the functions '4'p($,0), Ilp{^,&) needed in the computation 
of the Jacobian. We now rewrite (7.2) and ( |7.3| as 

*(z)*p(*,0)(z) = ©(z), (7.4) 
*(z) = np(*,0)(z)©(z). (7.5) 



If we take derivatives with respect to 9j and (pi, j G {1, . . . , <?}, i € {1, . . . ,p} on both sides of (7.4), we 
obtain a set of p + q polynomial equations: 

z'*p(0,0) + *(z)^^*|^=O, {!,..., ri, 

that determine uniquely the corresponding entries of the Jacobian due to the invertibility of ^{z). At 
the same time, taking derivatives on both the right and left hand sides of (7.5 1 with respect to 6j and 
we obtain another set of p + q polynomial equations 



a*, , dUp 
dU 



9&j d&3 



i(E{l,...,p}, 

that determine uniquely the corresponding entries of the Jacobian due to the invertibility of &{z). 
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7.3 Proof of Theorem K3\ 

(i) It is a straightforward consequence of the independence hypothesis between the samples and 



the £,'rp, and part (ii) of Proposition 2.1 



(ii) By (3.1) and part (i) of Proposition 2.1 we have that 



MSFE 



(^Xx+hj — E (^Xt+h — Xf+h 



= E 



i=h 



In order to compute this error notice that ET+h-i can be rewritten in terms of the original inno- 
vations as 

P-i P-i P-i-j 

j=0 j=0 fc=0 



-k- 



(7.6) 



Hence, 

MSFE (X^h) 



E 



P P-iP-i-j 
. i=0 i=/i j=0 fc=0 



E 



CP \ ^ P P P-i P-i-j 

'^tpieT+h-ij ^2^^^ ^ ipiipiTrj'ipkST+h-iST+h-i-j-k 
i=0 ) (=0 i=/i j=0 fe=0 



P P-i P-i-j 
.i=h j=0 k=0 



■ P P P P-i P-i-j 

^ - 2 ^ ^ ^ ^ V'iV'fc^^ V'^Tfj 

.i=?i ;=0 i=h j=0 k=0 



Jl,i+j + k 



P P-iP-i-j P P-i'P-i'-j' 
i=h j=0 k=0 i'=hj'=0 k'=Q 



i+ j +k ,i' + j ' +k' 



h-1 



i=0 



(iii) By part (i) the forecast Xr+h is given by 



Xr+h = E 



h— i • 



i=h 



According to the statement (3.2), both -0^ and tTj can be asymptotically written as 

^ r- t ■ 



(7.7) 



with Ti and f^- as Gaussian random variables of mean and variances (Sa), ^ and (Ss)j+p j+p, 
respectively. Consequently by (7.7) and (7.6) 

P P P-iP-i-j , 

Xr+h = E ■0ieT+/i-i = E E E ( 

i=?i i=/i j=0 /£=0 ^ 

P P-i P-i-j / 

i=/i j=0 fc=0 ^ 



+ 7/^ ) ^k£T+h-^-J-k 



T 



T 



^T+h-i-j-k- 
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We now recall that 



P-i p-i~j 

T^j'^k^T+h-i-j-k = £T+h-i, 

j=0 k=0 



and we eliminate in this expression the term that decays as 1/T; we hence approximate Xt+h as 



X 



T+h 



P-i P-i-j 

E E 

j=0 k=0 

P P-iP-i-j 



ffjiipki', 



E 

i=h 

P t- r — 'Lir—i—J . . 

E7 \ ^ \ ^ \ ^ WiWkl'j 
WiST+h-i + ^ 2.^ J^£T+h-i-]-k- 

i—h i—h j—0 k—0 



^T+h-i-j-k 



fr 



Using this approximation we compute now the MSFE: 



MSFE (StT^) = E 



Xt+h — Xt+h 



= E 



= E 



P P-iP-i~j 



^iST+h-i - Y i^iET+h^z "EE E 



^ h-l 



i—h 
P 



T 



ET+h-i-j-k 



i=h j=Q k=0 

P P-i P-i- j 



^ ^piET+h-i + (-0^ - V'i) ET+h-z -EE E ~7^^T+h-i-j-k 
L 2—0 i—h i — h i—0 k—0 



h-1 



h j=0 k=0 
■ P P P-i P-i- j 

+ 2^ ^ Y ipii^k{^s)i+j+k,j+p 

.i=h i=h j=0 k=0 



P P-i P-i- j P P-i' P-i' -j' 

+ EE E E E E '^^i^k-ipi''^k'{^s)j+P,3'+pSi+j + k,t'+r + k' 
i=h j=0 fc=0 i'=h j'=Q k'=0 

7.4 Proof of Theorem 13.41 

The mean square error associated to the forecast Xx+hif^) carried out using estimated parameters /3 is 
given by: 

MSFE (5t^(3)) =E 



= E 



Xr+h — XT+h{0) 
Xr+h — Xr+hif^) 



= E 



+ E 



{X^,if3)-X^h0) 



2E 

2 



Xr+h - Xr+hifi) + ^t+;,.(/3) - Xr+hifi) 

Xr+h - x^him) f5T^(/3) - x^hm 



(7.8) 



We now recall that 



5t^,(/3) =^V,£T+h-», with P = T + h-l + r, 



i—h 
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and notice that 



E 



E 



'h~i 



' h-1 



P-i P-i-j 
j=0 fe=o 



since the first term in the product involves the innovations {st+i, ■ ■ ■ , £T+h} and the second one 
{ei-r, ■ ■ ■ ,£t}', these two sets are disjoint and hence independent. Consequently by (2.6) and (7. 
we have 

h-l r 

MSFE (5t^(3)) ^a^J2^^+^ (5t^(/3) - 5t^(3)) 



The second summand of this expression can be asymptotically evaluated using (3.6). Indeed, 

2" 



E 



= E 
1 



E 



XT+hil3)-XT+h{f3)] |J"t 

p+q 



1 



^E[n{h)] = -E 



with 



dXT+hif^) d 



P P-k 



P P-k 



dp. 



1 \k=h 1=0 



k=h 1=0 



d^k , , StT; 

dp, dPj 



(7.9) 



T+h-k-l- 



Consequently, 

p+q 

E[n{h)]^ ^ i?[j.j,(E^) 

p+q P P-k P P-m 

-EEEEE 

ij = l k=h 1=0 ni=h n— 



dni \ f dip. 



djSj dp J J \ a Pi 



7r„ + t/i. 



dp. 



{^I3)ij E [XT+h-k-lXT+h-m-r 

(7.10) 

In the presence of the stationarity hypothesis in part (ii) of the theorem we have that 

E [Xr+h-k-iXT+h^m-n] ^ j{k + I ~ m ~ n) 



and hence (3.9) follows. Otherwise, since we have in general that 

t+r—l s+r—1 
E[XtXs]=(y^ Y E ^^^A-^,s-J, 
i=0 j=0 



(7.11) 
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we can insert this expression in (7.101 and we obtain 



p+q P P-kP-k-l P P-mP-m-n 

E[nm-^'Y.i:i: E E E E 

ij — l k—h l—Q u— m—h n— v—0 
d-Kl d-Kn 



\d(i;j dPi dl3j d(ii dPj dPi 



+ 



ll^klpm (S/3)j ■ i>u1pvSk+l+u,Tn+n+v 



P P-kP-k-l P P-mP-m-r 



'^^EE E EE E ii^Sp)k,mmTrn + i^Sp) k,n+pm'(p,, 
k=h 1=0 u=0 m=h n=0 v=0 

+ {^Sp)m,l+p'4'k'^n + {'^Sp)l+p.n+p'4'kiJm)i'ui'v3k+l+u,m+n+v 



The required identity (3.8) follows directly from (7.121 by noticing that 

P-fe P-k-l 



{T^l^u^k+l+u^m+n+v) — Sk,m+n+v, 



1=0 u=0 



and 



P—m P—m—n 

E E 

n=0 v=0 Ll=0 u=0 



P-k P-k-l 
E E i'^l'l^^^' 



P—m P—rn—n 



(7.12) 



n=0 v=0 



7.5 Proof of Proposition 4.4 



First, we notice that by (|4.2|) we have 



The same relation guarantees that X^^i, = Ym+i- Hence the result is a consequence of the following 
general fact: 

Lemma 7.1 Let z be a random variable in the probability space (il, P, J-) . Let J-* be a sub-sigma algebra 
of J- , that is, J-* C J-. Then 



E 



{z-E[z\T*]y 



> E 



{z-E[z\T]y 



(7.13) 



Proof of Lemma 17.11 



E 



{z~E[z\T*]y 



: E 

E 



{z-E[z\T*]-E[z\T]+E[z\T]y 



E 



{z~E[z\T]y 



{E [z| J-] - E [z\I-*]f + 2E [{z - E [z\I-]) {E [z| J"] - E [z| J"*])] . 



Given that E 



Indeed, 



iz-E[z\T*]y 



> 0, the inequality (7.13) follows if we show that 

e[{z-e[z\:f]){e[z\:f]-e[z\:f*])]^o. 



E[{z-E [z\F]) {E [z\F] - E [z\F*]) \F] = E zE [z\T] - zE [z\T*] - E [z\Ty + E [z\T] E [z\T*] \T 
= E [z\Ty - E [z\T] E [z\T*] - E [z| J"]^ + E [z\T] E [z\T*] =0. ■ 
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7.6 Proof of Proposition 4.4 



Part (i) is a straightforward consequence of (2.5). Regarding (ii), we first have that 

K i-l 



Consequently, 



MSFE ( X:^^^ 1 = E 



K K i-l i-l 

^ ^ WiWj ^ ^ ^i%l)meT+i-l£T+j-r 
i=l j = l 1=0 m=0 

K i-l K-1 K i-l 

i— 1 J— 2+1 



i=l 1=0 



1=0 



7.7 Proof of Theorem 14.71 



(i) ft is a straightforward consequence of part (i) in Theorem 3.3 



(ii) By (4.7) we have that 

MSFE (x^^)=e\(xT+k-X, 



t+K . 









'P(h) 


P(h) ^ 




= E 




ipiST+h-i 








\h=l 


i=0 


i=h 



where P{h) = T + h— 1 + r. We now notice that 



Hence, 

MSFE (x^^^ = E 



P(h)-i P(h)-i-j 

£T+h~i = X X TfjIpkST+h-i-j-k- (7-14) 
j=0 k=0 



K P(h) K P(h) P(h)-iP(h)-i-j 

X W/i X IpiET+h-i ~ X! ^'^ X! X! X! i^iT^ji^kST+h-i-j- 
Ji=l i=0 h=l i=h j=0 k=0 



E 



K P(h) 

X %pi£T+h- 



K if P(h') P(h) P(h)-iP(h)-i-j 

2 X X WhWh' X! X! X! X! i'l^iT^jipkST+h'-ieT+h- 

1=0 i=h j=0 k=0 



h=l h' = l 



r P P-i P-i-j 

^W/i^X X ^iTfjIpkST+h-i-j-k 
h=l i=h j=0 k=0 



= E 



K K 



P{h)P(h') 
h=l h' = l 1=0 i'=0 

(7.15) 

P(h) P(h') P(h)-iP{h)-i-j 

EE™' EE E E ^'^^^ 

/i=l h' = l (=0 i=h' j=0 k=0 

K K P(h) P(h)-i P(h)-i-j P{h') P(h')-i' P(h')-i' 

X X EE E E E E i-'kipk'i-'iT^j^Ji'T^f£T+h-i-j-kST+h'-i'-j 

i=h j=0 k=0 i'=h' j'=0 k'=0 



E 



K K 



Sh-Lh'-i- 



h=l h' = l 



fT^ < w, (A + S + C)w >, 



(7.16) 
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where A, B, C are the matrices with components given by 

P(h)p(h') 

Ahh' X! X! ^i'^i'^h-iM'-i', 
i=0 i'=0 



(7.17) 



P{h} P{h') P(h)-i P(h)-i-j 

Bhh' = ~2 ^ ^ ^ ^ i'lil^kE ipiTTj 

1=0 i=h' i=0 k=0 



•'h — l^h'—i—j — kj 



(7.18) 



P(h) P(h)-i P{h)-i-j P{h') P(h')-i' P(h')-i'-j' 

^-' = E E E E E E M^-E 

i=h 3=0 k=0 i'=h' j'=0 k'=0 



and P{h) =T + h-l + r, Pih') = T + h' - 1 



^h—i—j — k,h' — i'—j' — k' 5 

(7.19) 



(iii) We recall that by (4.7), the forecast X^^^^ is given by 



K Pih) 

^T+K = E ^'^ E ^i^T+h-i- 
h' = l i=h 



According to (3.2), both -0^ and ifj can be asymptotically written as 

r ■ t 
^, = ^, + ^ and 7fj=7rj + -^, 



with ri and tj Gaussian random variables of mean and variances and {'Ssp)j+p{K),j+p{K)j 

respectively. Consequently, 



TTj + 4= ) i^k^T+h-l-j-k 



K P(h) P(h)-i P{h)-i-j . 

x^k-Y.^kY. e e [^^^ 

h=l i=h j=0 fc=0 ^ 
K P{h) P{h)-i P{h)-i~j 

= + — ^ + — ^ + — eT+h-«-j-fe. 7.20 



h=l i=h j=0 k=0 

We now recall that 



T 



T 



T 



P{h)-i P{h)-i-j 

T^j^^k^T+h-i-j-k — £T+h-ii 

j=0 k=Q 



and we eliminate in (7.20) the term that decays as 1/T. We hence approximate X^^j^ as 

p{h)-iP{h)-i-j 



K P{h) 

X^+K - E "''^ E 
h—1 i — h 



£T+h-i 



E E 



i^iipktj 



j=0 k=0 



-£T+h-i-j-k 



K 
h=l 



P{h) 



P{h) P{h)-i P(h)-i-j 



E i'-^T+h-: + E E E 

i=h i=h J— k=0 



-ST+h-i-j-k 



(7.21) 
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Using this approximation wc now compute the MSFE: 

2" 



MSFE 



E 



E 



E 



E 



E 



2E 



K 

, h=l 
K 

^h=l 
K K 



P(h) 



P(h) 



P{h) P(h)-iP(h)-i-j 



1=0 i=h i=h j=0 k=0 ^ 

h-l P(h) ^ P{h)P{h)^iP{h)-i-] ^ ^^^ 



=0 i=h 
h-l h'-l 



i=h j=0 k=0 



^T+h-i-j-k 



whWh' ^ ^ ipiipiieT+h-iST+h' 



h=l h' = l 
K K 



i=0 i'=0 
P{h)P(h') 



whWh' X! X! ~ '^^) (^^*' ~ ^'') £T+h-i£T+h 

h—l h' — l i—h i'—h' 

' K K P{h) P{h)-iP{h)-i-j P{h') P{h')-i' P{h')-i'-j' 

EE«E E E E E E 

h=l h' = l i=h j=a k=0 i'=h' j'=0 fe'=0 

K K P{h) P{h') P(,h')~i' P(.h')'i'']' 



i^itpklpi'lpk'tjty 

T 



£T+h-i-j-k£T+h'-i 



h=l h' = l 



i—h i' — h' j'—O k'—O 



Using Lemma 3.2 this expression can be asymptotically approximated by: 

MSFE (x^) = ^ w, (^"''°'' + D + F + G)w>, 
where A'^^"''^ , D, F, G are matrices whose components are given by 

h-l h'-l 
i=0 i'=0 



P(h)p(h') 

Dhh' = ^J2 E (Ssp)m''^'^- 

i — h i'—h' 

P{h) P{h') Pih')--! P{h')~i'-j' 



i,h' —i' ? 



-F/ih' - E E E E i^i'i^k' {^sp),^p(^K)+j' ^h-z,h'-i'-j'-k', 



i—h i' — h' j'—O k'—Q 

^ P{h) P(h)-i P[h)-i-j P(h') P(h')-i' P(h')-i'-j' 

Ghh' = ^EE EEE E AMt'i'k' (Sh 

i=h j=0 fe=0 i'=h' j'=0 k'=0 



p)]+P(K),j'+P(K) 



^h—i—j — k.h' — i' — j' — k' 
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